Last updated: 2023-06-07

Checks: 6 1

Knit directory: DHT_fibroblast_snRNAseq/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20230528) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 2a24267. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/mf43/
    Ignored:    data/mf61/
    Ignored:    data/sca.rds
    Ignored:    output/fibroblast_seurat.rds

Untracked files:
    Untracked:  .DS_Store
    Untracked:  analysis/Differential_Expression.Rmd
    Untracked:  analysis/Preprocess_fibro_from10x.Rmd
    Untracked:  analysis/Subtype_Prediction.Rmd
    Untracked:  analysis/make_gsNetwork.R
    Untracked:  analysis/plot_commu_temp.R
    Untracked:  data/.DS_Store
    Untracked:  data/AR_sig_down.csv
    Untracked:  data/AR_sig_up.csv
    Untracked:  data/Wu_2020/
    Untracked:  data/genesGR.rds
    Untracked:  output/Table_Chap5_Supp3.xlsx
    Untracked:  output/pred_hpca.rds
    Untracked:  output/pred_wu.rds
    Untracked:  output/zlm_summary_mixed.rds

Unstaged changes:
    Modified:   .gitignore
    Modified:   README.md
    Modified:   analysis/_site.yml
    Deleted:    analysis/about.Rmd
    Modified:   analysis/index.Rmd
    Deleted:    analysis/license.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


library(tidyverse)
library(yaml)
library(scales)
library(pander)
library(glue)
library(edgeR)
library(AnnotationHub)
library(ensembldb)
library(cowplot)
library(ggfortify)
library(magrittr)
library(cqn)
library(ggrepel)
library(DT)
library(Seurat)
library(corrplot)
library(plotly)
library(patchwork)
library(colorspace)
library(ggpubr)
library(randomcoloR)
library(ggforce)
library(MAST)
library(pheatmap)
library(BiocParallel)
library(patchwork)
library(biomartr)
panderOptions("table.split.table", Inf)
panderOptions("big.mark", ",")
theme_set(
    theme_bw() +
              theme(
                  panel.background = element_blank(), 
                   panel.grid = element_blank()
              ))

Color palettes used by most scRNA-seq methods could be found here.

# sub_cols <- c("#729ECE", "#FF9E4A", "#67BF5C", "#ED665D", "#AD8BC9") %>%
#     set_names(unique(seurat[[]]$prediction))
patient_col <- c(
    "MF61" = "#A71B4B", 
    "MF43" = "#584B9F"
)
# scanpy colors were found in https://github.com/scverse/scanpy/blob/034ca2823804645e0d4874c9b16ba2eb8c13ac0f/scanpy/plotting/palettes.py
treat_col <- c(
  VEH = rgb(0.7, 0.7, 0.7),
  DHT = rgb(0.8, 0.2, 0.2))
sample_col <- hcl.colors(n = 4) %>%
    set_names(c("mf43DHT", "mf43VEH", "mf61DHT", "mf61VEH"))
ah <- AnnotationHub() %>%
  subset(rdataclass == "EnsDb") %>%
  subset(str_detect(description, "101")) %>%
  subset(genome == "GRCh38")
stopifnot(length(ah) == 1)
ensDb <- ah[[1]]

Gene Annotations

genesGR <- read_rds(here::here("data/genesGR.rds"))

Load 10x data

Outputs of the 10X cellranger pipelines were stored in different directories for cells derived from each of the four samples. Those raw reads were read in using the Read10X() function and made into individualSeurat object with the min.cells parameter set to 3 and min.features parameter set to 200 so only features detected in at least 3 cells and cells with at least 200 features within each sample were kept.

The 4 individual Seurat object were then combined to one using the merge() function. The merge step simply concatenates the counts and cell metadata together.Unique gens will also be added to the merged objects.

mf43DHT_rep1 <- Read10X(data.dir = "~/pilot_fibroblast_scRNAseq/data/mf43/MF43_1/MF43_DHT-1/")
mf43VEH_rep1 <- Read10X(data.dir = "~/pilot_fibroblast_scRNAseq/data/mf43/MF43_1/MF43_VEH-1/")
mf61DHT <- Read10X(data.dir = "~/pilot_fibroblast_scRNAseq/data/mf61/MF61D1-G/")
mf61VEH <- Read10X(data.dir = "~/pilot_fibroblast_scRNAseq/data/mf61/MF61V1-G/")
mf43DHT <- CreateSeuratObject(counts = mf43DHT_rep1, project = "mf43DHT", min.cells = 3, min.features = 200)
mf43VEH <- CreateSeuratObject(counts = mf43VEH_rep1, project = "mf43VEH", min.cells = 3, min.features = 200)
mf61DHT <- CreateSeuratObject(counts = mf61DHT, project = "mf61DHT", min.cells = 3, min.features = 200)
mf61VEH <- CreateSeuratObject(counts = mf61VEH, project = "mf61VEH", min.cells = 3, min.features = 200)
fibroblast <- merge(mf43DHT, y = c(mf43VEH, mf61DHT, mf61VEH), project = "pilot_fibroblast")

The merged object contained data for 27381 features across 29128 cells.

Preprocess

fibroblast[["percent.mt"]] <- PercentageFeatureSet(fibroblast, pattern = "^MT-")

The QC metrics were plotted as a violin plot.

QCplot <- VlnPlot(fibroblast, 
                  features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), 
                  cols = sample_col,
                  log = TRUE,
                  ncol = 3, 
                  group.by = "orig.ident", 
                  combine = FALSE,
                  pt.size = 0)  
QCplot <- lapply(QCplot, function(x){x + guides(fill=guide_legend("Sample"))})
names(QCplot) <- c("nFeature_RNA", "nCount_RNA", "percent.mt")
(QCplot$nFeature_RNA & labs(y = "# of unique features/cell", x = "", title = "") |
    QCplot$nCount_RNA & labs(y = "# of gene-wise counts/cell", x = "", title = "") |
    QCplot$percent.mt & labs(y = "% of Mitochondrial counts/cell", x = "", title = "")) + 
    plot_layout(guides = "collect") + 
    plot_annotation(tag_levels = 'A')
*Numers of features, total numbers of counts and percentages of mitocondrial counts for cells with in each sample.*

Numers of features, total numbers of counts and percentages of mitocondrial counts for cells with in each sample.

Cells that have unique feature counts over 4,000 or less than 200 and ones have >5% mitochondrial counts were filtered out.

fibroblast <- subset(fibroblast, subset = nFeature_RNA > 200 & nFeature_RNA < 4000 & percent.mt < 5)

26491 cells passed the filtration criteria.

fibroblast$Sample <- fibroblast[[]]$orig.ident
fibroblast$patient <- ifelse(str_detect(fibroblast[[]]$orig.ident, "mf43"), "MF43", "MF61")
fibroblast$Treatment <- ifelse(str_detect(fibroblast[[]]$orig.ident, "DHT"), "DHT", "VEH")

Significantly higher numbers of cells were derived from the two MF61 samples.

NumOfCells <- fibroblast[[]] %>%
    group_by(Treatment, patient, Sample) %>%
    summarise(n = n()) %>%
    ggplot(
        aes(Sample, n, fill = Treatment)
    ) +
    geom_bar(stat="identity", position=position_dodge()) +
    geom_text(aes(label=n), vjust=1.6, color="white",
              position = position_dodge(0.9), size=15) +
    theme_minimal() +
    labs(x = "",
         y = "Number of Cells") +
    # theme(
    #     axis.text.x = element_text(size = 12, angle = 90)
    # ) +
    scale_fill_manual(values = treat_col)
NumOfCells
*Numbers of cells in each samples, colored by treatment.Considerably more cells were from MF61.*

Numbers of cells in each samples, colored by treatment.Considerably more cells were from MF61.

Normalise & featue selection & clustering

Normalisation was performed to normalizes the feature expression measurements for each cell by the total expression, multiplies this by a scale factor (10,000 by default), and log-transforms the result.

fibroblast <- NormalizeData(fibroblast)

Feature selection was performed on the normalised object to only focus on the 2000 genes with top cell-to-cell variation in downstream analyses, and the 10 most highly variable genes were plotted on the variance-expression plot below.

fibroblast <- FindVariableFeatures(fibroblast, selection.method = "vst", nfeatures = 2000)
marker <- head(VariableFeatures(fibroblast), 10)
plot1 <- VariableFeaturePlot(fibroblast)
plot2 <- LabelPoints(plot = plot1, points = marker, repel = TRUE)
plot2
*Variance-expression relationship for all genes where the top 2000 highly variable features were colored by red and the gene names of the top 10 features were labelled by text*

Variance-expression relationship for all genes where the top 2000 highly variable features were colored by red and the gene names of the top 10 features were labelled by text

Next, the ScaleData() function was used to:

  • Shifts the expression of each gene, so that the mean expression across cells is 0
  • Scales the expression of each gene, so that the variance across cells is 1

This step gives equal weight in downstream analyses, so that highly-expressed genes do not dominate

all.genes <- rownames(fibroblast)
fibroblast <- ScaleData(fibroblast, features = all.genes)

PCA was then performed on the scaled data.

fibroblast <- RunPCA(fibroblast, features = VariableFeatures(object = fibroblast))
 DimPlot(fibroblast, 
        reduction = "pca",
        # group.by = "Treatment",
        pt.size = 0.1
        ) 
*PCA plots of all fibroblast cells from the pilot study*

PCA plots of all fibroblast cells from the pilot study

Chose PCA component

To choose how many PCA components to keep, DimHeatmap() function was firstly used to visualise the source of heterogeneity in a dataset, where cells and features were ordered according to their PCA scores for each PCA component.

DimHeatmap(fibroblast, dims = 1:20, cells = 500, balanced = TRUE)

Then the elbow plot visualising a ranking of principle components based on the percentage of variance explained by each one was plotted.

ElbowPlot(fibroblast)

Based on the plots above, I chose to use the first 18 PCs.

Cluster & Non-linear dimensional reduction

Cells were clustered using seurat’s FindNeighbors() and FindClusters() function with dims set to 18 and resolution set to 0.3.

fibroblast <- FindNeighbors(fibroblast, dims = 1:18)
fibroblast <- FindClusters(fibroblast, resolution = 0.3)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 26491
Number of edges: 804730

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9064
Number of communities: 7
Elapsed time: 4 seconds

7 unique clusters were identified.

Uniform manifold approximation and projection was perform for the cells and the dimensional reduction result was visualized.

fibroblast <- RunUMAP(fibroblast, dims = 1:15)
UMAP_clu <- DimPlot(fibroblast, reduction = "umap")
UMAP_clu 
*UMAP plot colored by clusters formed.*

UMAP plot colored by clusters formed.

The pre-processed Seurat object was saved as an RDS file.

fibroblast$cluster <- Idents(fibroblast)
saveRDS(fibroblast, file = here::here("output/fibroblast_seurat.rds"))

The UMAP plot was also colored by patient and treatment.

umap_p <- DimPlot(fibroblast, 
        reduction = "umap", 
        group.by = "patient", 
        pt.size = 0.1) +
    scale_color_manual(
        values = patient_col 
    )+
    ggtitle("")
umap_t <- DimPlot(fibroblast, 
        reduction = "umap", 
        group.by = "Treatment", 
        pt.size = 0.1
        ) +
    scale_color_manual(
        values = treat_col
    ) +
    ggtitle("")
UMAP_byPatient <- plot_grid(
    umap_p, umap_t , 
    labels = LETTERS[2:3],
    hjust = -1.5,
    vjust = 2
)
UMAP_byPatient
*UMAP plot colored by (L) patient and (R) treatment. The confounding effect of patient was still really strong.*

UMAP plot colored by (L) patient and (R) treatment. The confounding effect of patient was still really strong.

num_cell <- fibroblast[[]] %>%
    group_by(cluster, patient, Treatment) %>%
    summarise(
        n = n() )%>%
    ungroup() %>%
    group_by(cluster) %>%
    mutate(total = sum(n))

Proportion by patient

pie_ls <- num_cell %>%
    group_by(patient, cluster) %>%
    mutate(n_pass = sum(n)) %>%
    ungroup() %>%
    mutate(prop_pass = n_pass/total) %>%
    dplyr::select(cluster, patient, prop_pass) %>%
    unique()  %>%
    split(f = .$cluster) %>%
    lapply(function(x){
        ggplot(
            x, aes(x = "", y = prop_pass, fill = patient)
        ) +
            geom_bar(width = 1, stat = "identity") +
            scale_fill_manual(values = patient_col, name = "Patient") + 
            coord_polar("y", start=0) +
            theme_void()  
    })
plot_grid(
    plot_grid(
        plotlist = pie_ls %>%
            lapply(function(x){
                x + theme(
                    legend.position = "none"
                )
            }), 
        labels = paste("Cluster ", 0:6, sep = ""), 
        label_size = 12
    ), 
    get_legend(pie_ls[[1]]), 
    rel_widths = c(9,1))
*Propotion of cells derived from each patient for each of the 6 cluster. Cluster 1 was mostly MF43 cells while all the other 5 cluster were predominantly MF61 cells*

Propotion of cells derived from each patient for each of the 6 cluster. Cluster 1 was mostly MF43 cells while all the other 5 cluster were predominantly MF61 cells

Proportion by treatment

pie_ls2 <- num_cell %>%
    group_by(Treatment, cluster) %>%
    mutate(n_treat = sum(n)) %>%
    ungroup() %>%
    mutate(prop_treat = n_treat/total) %>%
    dplyr::select(cluster, Treatment, prop_treat) %>%
    unique()  %>%
    split(f = .$cluster) %>%
    lapply(function(x){
        ggplot(
            x, aes(x = "", y = prop_treat, fill = Treatment)
        ) +
            geom_bar(width = 1, stat = "identity") +
            scale_fill_manual(values = treat_col) + 
            coord_polar("y", start=0) +
            theme_void() 
    })
plot_grid(
    plotlist = pie_ls2, 
    labels = paste("Cluster ", 0:5, sep = "")
)
*Propotion of cells derived from each traetment group for each of the 6 cluster. Apart from cluster 1, there are more vehicle cells than DHT-treated one in all clusters.*

Propotion of cells derived from each traetment group for each of the 6 cluster. Apart from cluster 1, there are more vehicle cells than DHT-treated one in all clusters.

Marker Selection

Positive markers for every cluster compared to all remaining cells were found using the FindAllMarkers() function with min.pct set to 0.3 and logfc.threshold set to 1.2. The numbers of markers found for each cluster were:

cluster_marker <- FindAllMarkers(
    fibroblast, min.pct = 0.3, logfc.threshold = 1.2, only.pos = TRUE
)
cluster_marker  %>%
    split(f = .$cluster) %>%
    vapply(nrow, integer(1)) %>%
    pander()
0 1 2 3 4 5 6
2 10 11 8 2 1 13

Genes included in the heatmap are shown in the table below.

cluster_marker <- cluster_marker %>%
    dplyr::select(cluster, gene_name = gene) %>%
    left_join(
        genes(ensDb) %>%
            as.data.frame() %>%
            dplyr::select(
                gene_name, description, 
                gene_id
            ) %>%
            mutate(
                description = gsub("\\[.*?\\]", "", description)
            )
    ) %>%
    unique() 
# marker_GO <- getGO(
#     organism = "Homo Sapiens", 
#     genes = cluster_marker$gene_id,
#     filters  = "ensembl_gene_id"
# )
cluster_marker %>%
    dplyr::select(-gene_id) %>%
    # dplyr::filter(cluster == "1") %>%
    # pull(gene_name)
    unique() %>%
    mutate_all(as.factor) %>%
    datatable(
        filter = "top",
        options = list(scrollY = '450px')
    ) 

Expressions of all marker genes selected for each cluster combined were visualised as a heatmap.

marker_hp <- DoHeatmap(
    fibroblast, 
    features = cluster_marker$gene_name, 
    # label = FALSE, 
    group.by = "cluster", 
    angle = 0, 
    hjust = 0.5, 
    size = 24,
    group.bar.height = 0.05) 
marker_hp
*Expressions of marker genes selected for each cluster were visualised as a heatmap*

Expressions of marker genes selected for each cluster were visualised as a heatmap

Expressions of the marker genes for each cluster were visualised using the FeaturePlot() function.

marker_list <- cluster_marker %>%
    dplyr::select(cluster, gene_name) %>%
    split(f = .$cluster) %>%
    lapply(pull, gene_name) %>%
    lapply(unique)
names(marker_list) <- paste("Cluster", names(marker_list), sep = " ")
marker_pl <- sapply(names(marker_list), function(x){
    FeaturePlot(
    fibroblast, features = marker_list[[x]]
)
}, simplify = FALSE)
cluster_cols <- hue_pal()(length(marker_pl))
names(cluster_cols) <- names(marker_pl)
UMAP_pl <- sapply(names(cluster_cols), function(x){
    cols <- ifelse(names(cluster_cols) == x, cluster_cols[[x]], "gray")
    DimPlot(
        fibroblast, 
        reduction = "umap",  
        cols = cols
    )
}, simplify = FALSE)
cluster_ls <- sapply(names(marker_pl), function(x){
    plot_grid(
        marker_pl[[x]], 
        UMAP_pl[[x]], 
        nrow = 1, 
        rel_widths = c(2, 1))
}, simplify = FALSE)
htmltools::tagList(
    lapply(names(cluster_ls), function(y){
                        cp <- htmltools::tags$em(glue(
                            "Expressions of the top 10 marker genes selected for cluster {y} on UMAP plots with a UMAP highlighting where cluster {y} cells locate. "
                        ))
                        htmltools::div(
                            id = y,
                            class="section level4",
                            htmltools::h4(class = "tabset", y),
                            htmltools::div(
                                class = "figure", style = "text-align: center",
                                htmltools::plotTag(cluster_ls[[y]], alt = "test", width = 1500, height = 1000),
                                htmltools::p(
                                    class = "caption", htmltools::tags$em(cp)
                                )
                            )
                        )
                    })
                )

Cluster 0

test

Expressions of the top 10 marker genes selected for cluster Cluster 0 on UMAP plots with a UMAP highlighting where cluster Cluster 0 cells locate.

Cluster 1

test

Expressions of the top 10 marker genes selected for cluster Cluster 1 on UMAP plots with a UMAP highlighting where cluster Cluster 1 cells locate.

Cluster 2

test

Expressions of the top 10 marker genes selected for cluster Cluster 2 on UMAP plots with a UMAP highlighting where cluster Cluster 2 cells locate.

Cluster 3

test

Expressions of the top 10 marker genes selected for cluster Cluster 3 on UMAP plots with a UMAP highlighting where cluster Cluster 3 cells locate.

Cluster 4

test

Expressions of the top 10 marker genes selected for cluster Cluster 4 on UMAP plots with a UMAP highlighting where cluster Cluster 4 cells locate.

Cluster 5

test

Expressions of the top 10 marker genes selected for cluster Cluster 5 on UMAP plots with a UMAP highlighting where cluster Cluster 5 cells locate.

Cluster 6

test

Expressions of the top 10 marker genes selected for cluster Cluster 6 on UMAP plots with a UMAP highlighting where cluster Cluster 6 cells locate.

Fibroblast sites of origin markers

In the paper by Morsing M, et al. 2016, lobular human breast fibroblasts were found to be CD105high/CD26low while interlobular fibroblasts are CD105low/CD26high.

The expressions of the 2 genes across all cells were plotted:

library(flextable)
eng <- FeaturePlot(
    fibroblast, features = "ENG") +
    ggtitle(
        paste0("*ENG*", " (Lobular Fibroblast Marker)")
    ) +
    mdthemes::md_theme_classic() +
    theme(
        legend.position = "none"
    )
dpp4 <- FeaturePlot(
    fibroblast, features = "DPP4") +
   ggtitle(
        paste0("*DPP4*", " (Interlobular Fibroblast Marker)")
    ) +
    mdthemes::md_theme_classic() 
lobular_marker <- plot_grid(
        eng, dpp4, 
        nrow = 1
    )
lobular_marker
*Expression of lobular fibroblast marker gene ENG(CD105) and interlobular fibroblast marker gene DPP4 (CD26) among all cells *

Expression of lobular fibroblast marker gene ENG(CD105) and interlobular fibroblast marker gene DPP4 (CD26) among all cells

Most of the sequenced cells had a lobular origin, while only a small proportion of cells may have been interlobular.

Thesis figures

# # Table 1 number of cells per cluster
# num_cell %>%
#     dplyr::select(cluster, total) %>%
#     mutate(cluster = paste("Cluster", cluster, sep = " ")) %>%
#     unique() %>%
#     ungroup() %>%
#     add_row(
#         summarise(
#             ., 
#             cluster = "Total", 
#             total = sum(total)
#         )
#     ) %>%
#     dplyr::rename(
#         Cluster = cluster, 
#         `Number of cells` = total
#     ) %>%
#     xtable::xtable(
#         caption = "The numbers of cells assigned to each mammary fibroblasts snRNA-seq cluster.The numbers of cells assigned to each mammary fibroblasts snRNA-seq cluster. Clusters were identified through a shared nearest neighbor (SNN) modularity optimization based clustering algorithm provided by the Seurat's FindClusters function, with the resolution set to 0.3.",
#         label = "chap5_table1"
#     )  %>%
#     print("~/PhD_thesis/draft_figure/chapter_05/table1.tex", type = "latex",
#           caption.placement = "top",
#           include.rownames = FALSE,
#           # booktabs = TRUE
#     )
## Table 2 (Marker genes defining each cluster)
# library(kableExtra)
# cluster_marker %>%
#     mutate(
#         cluster = paste("Cluster", cluster, sep = " "), 
#         description = gsub("\\[.*\\]", "", description)
#     ) %>%
#     # left_join(
#     #     kg
#     # ) %>%
#     dplyr::select(
#         Cluster = cluster, 
#         `Gene name` = gene_name, 
#         Description = description,
#         # `Gene-set name` = gs_name
#     ) %>%
#     unique() %>%
#     xtable::xtable(
#         caption = "Top marker genes selected for each cluster in the single-nucleus mammary gland fibroblast RNA-seq data]{The top 10 genes with the highest mean log2FC for clusters where more than 10 marker genes were detected, or all markers for clusters with a lower number of marker genes, along with each gene's description retrieved from Ensembl release 101.",
#         label = "chap5_table1", 
#         longtable = TRUE, 
#         type = "latex",
#         booktabs = TRUE
#     )  %>%
#     print("~/PhD_thesis/draft_figure/chapter_05/table2.tex",
#           caption.placement = "top",
#           type = "latex",
#           include.rownames = FALSE
#     )
library(patchwork)
chap5_fig2 <- ((UMAP_clu | plot_spacer()| umap_p) +
                   plot_layout(
                       widths = c(0.5,0.05, 0.45))) / 
    ( umap_t | NumOfCells)  +
    plot_annotation(
        tag_levels = "A"
    ) & 
    theme(
        # axis.title = element_text(size = 20), 
        # legend.key.size = unit(1, 'cm'), 
        # legend.text = element_text(size = 16), 
        # legend.title = element_text(size = 18),
        text = element_text(size = 50),
        plot.tag = element_text(size = 55, face = "bold" )
    )
# png(
#     "/Users/wenjunliu/PhD_thesis/Images/Chapter_05/chap5_fig2.png",
#     height = 2000,
#     width = 2800)
chap5_fig2
*Figure 2. UMAP plots coloured by (A) clusters formed, (B) patient origin and (C) treatment, and (D) the numbers of useable cells detected in each sample. A cell was considered useable if it has >200 and <43,000 unique feature counts, in addition to <5% mitochondrial genes.*

Figure 2. UMAP plots coloured by (A) clusters formed, (B) patient origin and (C) treatment, and (D) the numbers of useable cells detected in each sample. A cell was considered useable if it has >200 and <43,000 unique feature counts, in addition to <5% mitochondrial genes.

# dev.off()
# Figure 2 (marker genes)
#     nrow = 2)
chap5_fig3 <-  (eng | dpp4) +
    plot_annotation(tag_levels = "A") & 
    theme(
        text = element_text(size = 50),
        plot.tag = element_text(size = 55, face = "bold" ), 
        legend.key.size = unit(2, "cm")
    )

# png(
#     "/Users/wenjunliu/PhD_thesis/Images/Chapter_05/chap5_fig3.png",
#     height = 1400,
#     width = 2800
# )
chap5_fig3
*Figure 3. UMAP plots showing the expressions of (A) lobular marker ENG and (B) interlobular marker DPP4.*

Figure 3. UMAP plots showing the expressions of (A) lobular marker ENG and (B) interlobular marker DPP4.

# dev.off()
chap5_fig6 <- marker_hp +
    guides(
        color = "none"
    ) +
    theme(
        text = element_text(size = 40), 
        legend.key.size = unit(3, "cm"), 
        panel.border = element_blank()
    )
# png(
#     "/Users/wenjunliu/PhD_thesis/Images/Chapter_05/chap5_fig6.png",
#     height = 2400,
#     width = 2800
# )
# chap5_fig6
# dev.off()
# Supplementary Figure 1 (QC violin plots + elbow plot)

chap5_sup1 <- ((QCplot$nFeature_RNA & labs(y = "# of unique features/cell", x = "", title = "") |
    QCplot$nCount_RNA & labs(y = "# of total counts/cell", x = "", title = "")) )  / 
    ((QCplot$percent.mt & labs(y = "% of Mitochondrial counts/cell", x = "", title = "") &
          theme(
              axis.title.y = element_text(size = 45)
          ))  |
         ElbowPlot(fibroblast) ) + 
    plot_layout(guides = "collect") +
    plot_annotation(tag_levels = 'A') &
    theme(
        text = element_text(size = 50),
        plot.tag = element_text(size = 55, face = "bold" ), 
        legend.key.size = unit(2, "cm")
    ) +
    theme(
        axis.text = element_text(size = 35)
    )
png(
    "/Users/wenjunliu/PhD_thesis/Images/Chapter_05/chap5_sup1.png",
    height = 1500,
    width = 2000
)
chap5_sup1
dev.off()
quartz_off_screen 
                2 
bar_byPatient <- num_cell %>%
    group_by(patient, cluster) %>%
    mutate(n_pat = sum(n)) %>%
    ungroup() %>%
    ggplot(
        aes(cluster, n_pat, fill = patient)
    ) +
    geom_bar(stat="identity", position=position_dodge()) +
    # geom_text(aes(label=n), vjust=1.6, color="white",
    #           position = position_dodge(0.9), size=3.5) +
    theme_minimal() +
    labs(x = "Cluster",
         y = "Number of Cells") +
    scale_fill_manual(values = patient_col, 
                      name = "Patient") +
    theme(
        panel.grid = element_blank(),
        legend.margin=margin(0,0,0,0),
        legend.box.margin=margin(-10,10,-10,-10)
    )
bar_byTreat <- num_cell %>%
    group_by(Treatment, cluster) %>%
    mutate(n_treat = sum(n)) %>%
    ungroup() %>%
    ggplot(
        aes(cluster, n_treat, fill = Treatment)
    ) +
    geom_bar(stat="identity", position=position_dodge()) +
    # geom_text(aes(label=n), vjust=1.6, color="white",
    #           position = position_dodge(0.9), size=3.5) +
    theme_minimal() +
    labs(x = "Cluster",
         y = "Number of Cells") +
    scale_fill_manual(values = treat_col, 
                      name = "Treatment") +
    theme(
        panel.grid = element_blank(), 
        legend.margin=margin(0,0,0,0),
        legend.box.margin=margin(-10,0,-10,-10)
    )
chap5_sup2 <- (bar_byPatient | bar_byTreat) + 
    plot_layout(guides = "collect") +
    plot_annotation(tag_levels = 'A') &
    theme(
        text = element_text(size = 50),
        # axis.text.x = element_text(size = 50),
        plot.tag = element_text(size = 55, face = "bold" ), 
        legend.key.size = unit(2, "cm")
    )
# png(
#     "/Users/wenjunliu/PhD_thesis/Images/Chapter_05/chap5_sup2.png",
#     height = 1200,
#     width = 2200
# )
chap5_sup2
*Supp Fig 2. Numbers of fibroblast cells assigned to each cluster, grouped by (A) source of origin and (B) treatment.*

Supp Fig 2. Numbers of fibroblast cells assigned to each cluster, grouped by (A) source of origin and (B) treatment.

# dev.off()

Seesion Information

sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Adelaide
tzcode source: internal

attached base packages:
[1] splines   stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] flextable_0.9.1             biomartr_1.0.3             
 [3] BiocParallel_1.34.1         pheatmap_1.0.12            
 [5] MAST_1.26.0                 SingleCellExperiment_1.22.0
 [7] SummarizedExperiment_1.30.1 MatrixGenerics_1.12.0      
 [9] matrixStats_0.63.0          ggforce_0.4.1              
[11] randomcoloR_1.1.0.1         ggpubr_0.6.0               
[13] colorspace_2.1-0            patchwork_1.1.2            
[15] plotly_4.10.1               corrplot_0.92              
[17] SeuratObject_4.1.3          Seurat_4.3.0               
[19] DT_0.27                     ggrepel_0.9.3              
[21] cqn_1.46.0                  quantreg_5.95              
[23] SparseM_1.81                preprocessCore_1.62.1      
[25] nor1mix_1.3-0               mclust_6.0.0               
[27] magrittr_2.0.3              ggfortify_0.4.16           
[29] cowplot_1.1.1               ensembldb_2.24.0           
[31] AnnotationFilter_1.24.0     GenomicFeatures_1.52.0     
[33] AnnotationDbi_1.62.1        Biobase_2.60.0             
[35] GenomicRanges_1.52.0        GenomeInfoDb_1.36.0        
[37] IRanges_2.34.0              S4Vectors_0.38.1           
[39] AnnotationHub_3.8.0         BiocFileCache_2.8.0        
[41] dbplyr_2.3.2                BiocGenerics_0.46.0        
[43] edgeR_3.42.2                limma_3.56.1               
[45] glue_1.6.2                  pander_0.6.5               
[47] scales_1.2.1                yaml_2.3.7                 
[49] lubridate_1.9.2             forcats_1.0.0              
[51] stringr_1.5.0               dplyr_1.1.2                
[53] purrr_1.0.1                 readr_2.1.4                
[55] tidyr_1.3.0                 tibble_3.2.1               
[57] ggplot2_3.4.2               tidyverse_2.0.0            
[59] workflowr_1.7.0            

loaded via a namespace (and not attached):
  [1] R.methodsS3_1.8.2             progress_1.2.2               
  [3] goftest_1.2-3                 Biostrings_2.68.0            
  [5] vctrs_0.6.2                   spatstat.random_3.1-4        
  [7] digest_0.6.31                 png_0.1-8                    
  [9] git2r_0.32.0                  deldir_1.0-6                 
 [11] httpcode_0.3.0                parallelly_1.35.0            
 [13] MASS_7.3-60                   fontLiberation_0.1.0         
 [15] reshape2_1.4.4                httpuv_1.6.10                
 [17] withr_2.5.0                   xfun_0.39                    
 [19] ellipsis_0.3.2                survival_3.5-5               
 [21] commonmark_1.9.0              memoise_2.0.1                
 [23] crul_1.3                      MatrixModels_0.5-1           
 [25] systemfonts_1.0.4             ragg_1.2.5                   
 [27] zoo_1.8-12                    V8_4.3.0                     
 [29] pbapply_1.7-0                 R.oo_1.25.0                  
 [31] prettyunits_1.1.1             KEGGREST_1.40.0              
 [33] promises_1.2.0.1              httr_1.4.6                   
 [35] rstatix_0.7.2                 restfulr_0.0.15              
 [37] globals_0.16.2                fitdistrplus_1.1-11          
 [39] ps_1.7.5                      rstudioapi_0.14              
 [41] miniUI_0.1.1.1                generics_0.1.3               
 [43] base64enc_0.1-3               processx_3.8.1               
 [45] curl_5.0.0                    zlibbioc_1.46.0              
 [47] polyclip_1.10-4               GenomeInfoDbData_1.2.10      
 [49] interactiveDisplayBase_1.38.0 xtable_1.8-4                 
 [51] evaluate_0.21                 S4Arrays_1.0.4               
 [53] hms_1.1.3                     irlba_2.3.5.1                
 [55] filelock_1.0.2                ROCR_1.0-11                  
 [57] reticulate_1.28               spatstat.data_3.0-1          
 [59] lmtest_0.9-40                 later_1.3.1                  
 [61] lattice_0.21-8                spatstat.geom_3.2-1          
 [63] future.apply_1.10.0           getPass_0.2-2                
 [65] scattermore_1.0               XML_3.99-0.14                
 [67] RcppAnnoy_0.0.20              pillar_1.9.0                 
 [69] nlme_3.1-162                  compiler_4.3.0               
 [71] stringi_1.7.12                tensor_1.5                   
 [73] GenomicAlignments_1.36.0      plyr_1.8.8                   
 [75] crayon_1.5.2                  abind_1.4-5                  
 [77] BiocIO_1.10.0                 ggtext_0.1.2                 
 [79] locfit_1.5-9.7                sp_1.6-0                     
 [81] bit_4.0.5                     whisker_0.4.1                
 [83] codetools_0.2-19              textshaping_0.3.6            
 [85] openssl_2.0.6                 crosstalk_1.2.0              
 [87] bslib_0.4.2                   mime_0.12                    
 [89] markdown_1.6                  Rcpp_1.0.10                  
 [91] gridtext_0.1.5                knitr_1.42                   
 [93] blob_1.2.4                    utf8_1.2.3                   
 [95] here_1.0.1                    BiocVersion_3.17.1           
 [97] fs_1.6.2                      listenv_0.9.0                
 [99] ggsignif_0.6.4                Matrix_1.5-4                 
[101] callr_3.7.3                   tzdb_0.3.0                   
[103] tweenr_2.0.2                  pkgconfig_2.0.3              
[105] tools_4.3.0                   cachem_1.0.8                 
[107] RSQLite_2.3.1                 viridisLite_0.4.2            
[109] DBI_1.1.3                     fastmap_1.1.1                
[111] rmarkdown_2.21                grid_4.3.0                   
[113] ica_1.0-3                     Rsamtools_2.16.0             
[115] broom_1.0.4                   sass_0.4.6                   
[117] officer_0.6.2                 BiocManager_1.30.20          
[119] carData_3.0-5                 RANN_2.6.1                   
[121] farver_2.1.1                  rtracklayer_1.60.0           
[123] cli_3.6.1                     leiden_0.4.3                 
[125] lifecycle_1.0.3               askpass_1.1                  
[127] uwot_0.1.14                   backports_1.4.1              
[129] timechange_0.2.0              gtable_0.3.3                 
[131] rjson_0.2.21                  ggridges_0.5.4               
[133] progressr_0.13.0              parallel_4.3.0               
[135] jsonlite_1.8.4                bitops_1.0-7                 
[137] bit64_4.0.5                   Rtsne_0.16                   
[139] spatstat.utils_3.0-3          zip_2.3.0                    
[141] jquerylib_0.1.4               highr_0.10                   
[143] R.utils_2.12.2                lazyeval_0.2.2               
[145] shiny_1.7.4                   htmltools_0.5.5              
[147] sctransform_0.3.5             rappdirs_0.3.3               
[149] gfonts_0.2.0                  XVector_0.40.0               
[151] gdtools_0.3.3                 RCurl_1.98-1.12              
[153] rprojroot_2.0.3               mdthemes_0.1.0               
[155] gridExtra_2.3                 igraph_1.4.2                 
[157] R6_2.5.1                      labeling_0.4.2               
[159] cluster_2.1.4                 DelayedArray_0.26.2          
[161] tidyselect_1.2.0              ProtGenerics_1.32.0          
[163] xml2_1.3.4                    fontBitstreamVera_0.1.1      
[165] car_3.1-2                     future_1.32.0                
[167] munsell_0.5.0                 KernSmooth_2.23-21           
[169] fontquiver_0.2.1              data.table_1.14.8            
[171] htmlwidgets_1.6.2             RColorBrewer_1.1-3           
[173] biomaRt_2.56.0                rlang_1.1.1                  
[175] spatstat.sparse_3.0-1         spatstat.explore_3.1-0       
[177] uuid_1.1-0                    fansi_1.0.4                  

sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Adelaide
tzcode source: internal

attached base packages:
[1] splines   stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] flextable_0.9.1             biomartr_1.0.3             
 [3] BiocParallel_1.34.1         pheatmap_1.0.12            
 [5] MAST_1.26.0                 SingleCellExperiment_1.22.0
 [7] SummarizedExperiment_1.30.1 MatrixGenerics_1.12.0      
 [9] matrixStats_0.63.0          ggforce_0.4.1              
[11] randomcoloR_1.1.0.1         ggpubr_0.6.0               
[13] colorspace_2.1-0            patchwork_1.1.2            
[15] plotly_4.10.1               corrplot_0.92              
[17] SeuratObject_4.1.3          Seurat_4.3.0               
[19] DT_0.27                     ggrepel_0.9.3              
[21] cqn_1.46.0                  quantreg_5.95              
[23] SparseM_1.81                preprocessCore_1.62.1      
[25] nor1mix_1.3-0               mclust_6.0.0               
[27] magrittr_2.0.3              ggfortify_0.4.16           
[29] cowplot_1.1.1               ensembldb_2.24.0           
[31] AnnotationFilter_1.24.0     GenomicFeatures_1.52.0     
[33] AnnotationDbi_1.62.1        Biobase_2.60.0             
[35] GenomicRanges_1.52.0        GenomeInfoDb_1.36.0        
[37] IRanges_2.34.0              S4Vectors_0.38.1           
[39] AnnotationHub_3.8.0         BiocFileCache_2.8.0        
[41] dbplyr_2.3.2                BiocGenerics_0.46.0        
[43] edgeR_3.42.2                limma_3.56.1               
[45] glue_1.6.2                  pander_0.6.5               
[47] scales_1.2.1                yaml_2.3.7                 
[49] lubridate_1.9.2             forcats_1.0.0              
[51] stringr_1.5.0               dplyr_1.1.2                
[53] purrr_1.0.1                 readr_2.1.4                
[55] tidyr_1.3.0                 tibble_3.2.1               
[57] ggplot2_3.4.2               tidyverse_2.0.0            
[59] workflowr_1.7.0            

loaded via a namespace (and not attached):
  [1] R.methodsS3_1.8.2             progress_1.2.2               
  [3] goftest_1.2-3                 Biostrings_2.68.0            
  [5] vctrs_0.6.2                   spatstat.random_3.1-4        
  [7] digest_0.6.31                 png_0.1-8                    
  [9] git2r_0.32.0                  deldir_1.0-6                 
 [11] httpcode_0.3.0                parallelly_1.35.0            
 [13] MASS_7.3-60                   fontLiberation_0.1.0         
 [15] reshape2_1.4.4                httpuv_1.6.10                
 [17] withr_2.5.0                   xfun_0.39                    
 [19] ellipsis_0.3.2                survival_3.5-5               
 [21] commonmark_1.9.0              memoise_2.0.1                
 [23] crul_1.3                      MatrixModels_0.5-1           
 [25] systemfonts_1.0.4             ragg_1.2.5                   
 [27] zoo_1.8-12                    V8_4.3.0                     
 [29] pbapply_1.7-0                 R.oo_1.25.0                  
 [31] prettyunits_1.1.1             KEGGREST_1.40.0              
 [33] promises_1.2.0.1              httr_1.4.6                   
 [35] rstatix_0.7.2                 restfulr_0.0.15              
 [37] globals_0.16.2                fitdistrplus_1.1-11          
 [39] ps_1.7.5                      rstudioapi_0.14              
 [41] miniUI_0.1.1.1                generics_0.1.3               
 [43] base64enc_0.1-3               processx_3.8.1               
 [45] curl_5.0.0                    zlibbioc_1.46.0              
 [47] polyclip_1.10-4               GenomeInfoDbData_1.2.10      
 [49] interactiveDisplayBase_1.38.0 xtable_1.8-4                 
 [51] evaluate_0.21                 S4Arrays_1.0.4               
 [53] hms_1.1.3                     irlba_2.3.5.1                
 [55] filelock_1.0.2                ROCR_1.0-11                  
 [57] reticulate_1.28               spatstat.data_3.0-1          
 [59] lmtest_0.9-40                 later_1.3.1                  
 [61] lattice_0.21-8                spatstat.geom_3.2-1          
 [63] future.apply_1.10.0           getPass_0.2-2                
 [65] scattermore_1.0               XML_3.99-0.14                
 [67] RcppAnnoy_0.0.20              pillar_1.9.0                 
 [69] nlme_3.1-162                  compiler_4.3.0               
 [71] stringi_1.7.12                tensor_1.5                   
 [73] GenomicAlignments_1.36.0      plyr_1.8.8                   
 [75] crayon_1.5.2                  abind_1.4-5                  
 [77] BiocIO_1.10.0                 ggtext_0.1.2                 
 [79] locfit_1.5-9.7                sp_1.6-0                     
 [81] bit_4.0.5                     whisker_0.4.1                
 [83] codetools_0.2-19              textshaping_0.3.6            
 [85] openssl_2.0.6                 crosstalk_1.2.0              
 [87] bslib_0.4.2                   mime_0.12                    
 [89] markdown_1.6                  Rcpp_1.0.10                  
 [91] gridtext_0.1.5                knitr_1.42                   
 [93] blob_1.2.4                    utf8_1.2.3                   
 [95] here_1.0.1                    BiocVersion_3.17.1           
 [97] fs_1.6.2                      listenv_0.9.0                
 [99] ggsignif_0.6.4                Matrix_1.5-4                 
[101] callr_3.7.3                   tzdb_0.3.0                   
[103] tweenr_2.0.2                  pkgconfig_2.0.3              
[105] tools_4.3.0                   cachem_1.0.8                 
[107] RSQLite_2.3.1                 viridisLite_0.4.2            
[109] DBI_1.1.3                     fastmap_1.1.1                
[111] rmarkdown_2.21                grid_4.3.0                   
[113] ica_1.0-3                     Rsamtools_2.16.0             
[115] broom_1.0.4                   sass_0.4.6                   
[117] officer_0.6.2                 BiocManager_1.30.20          
[119] carData_3.0-5                 RANN_2.6.1                   
[121] farver_2.1.1                  rtracklayer_1.60.0           
[123] cli_3.6.1                     leiden_0.4.3                 
[125] lifecycle_1.0.3               askpass_1.1                  
[127] uwot_0.1.14                   backports_1.4.1              
[129] timechange_0.2.0              gtable_0.3.3                 
[131] rjson_0.2.21                  ggridges_0.5.4               
[133] progressr_0.13.0              parallel_4.3.0               
[135] jsonlite_1.8.4                bitops_1.0-7                 
[137] bit64_4.0.5                   Rtsne_0.16                   
[139] spatstat.utils_3.0-3          zip_2.3.0                    
[141] jquerylib_0.1.4               highr_0.10                   
[143] R.utils_2.12.2                lazyeval_0.2.2               
[145] shiny_1.7.4                   htmltools_0.5.5              
[147] sctransform_0.3.5             rappdirs_0.3.3               
[149] gfonts_0.2.0                  XVector_0.40.0               
[151] gdtools_0.3.3                 RCurl_1.98-1.12              
[153] rprojroot_2.0.3               mdthemes_0.1.0               
[155] gridExtra_2.3                 igraph_1.4.2                 
[157] R6_2.5.1                      labeling_0.4.2               
[159] cluster_2.1.4                 DelayedArray_0.26.2          
[161] tidyselect_1.2.0              ProtGenerics_1.32.0          
[163] xml2_1.3.4                    fontBitstreamVera_0.1.1      
[165] car_3.1-2                     future_1.32.0                
[167] munsell_0.5.0                 KernSmooth_2.23-21           
[169] fontquiver_0.2.1              data.table_1.14.8            
[171] htmlwidgets_1.6.2             RColorBrewer_1.1-3           
[173] biomaRt_2.56.0                rlang_1.1.1                  
[175] spatstat.sparse_3.0-1         spatstat.explore_3.1-0       
[177] uuid_1.1-0                    fansi_1.0.4